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Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex 
networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states, for exam¬ 
ple those employed in studies of molecular kinetics, and models with spatial disorder can give rise to memory 
and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for 
analyzing CTRWs on complex energy landscapes do not address these effects. We therefore use statistical 
mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbi¬ 
trary connectivity, energy landscape, and waiting time distributions. Our approach is valuable for calculating 
higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative 
or non-conservative force along a path. For homogeneous networks we derive exact relations between length 
and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time 
projection. For more general models we obtain recursion relations, reminiscent of transfer matrix and exact 
enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algo¬ 
rithm in PathMAN, a Python script that users can easily apply to their model of choice. We demonstrate the 
algorithm on a few representative examples which underscore the importance of non-exponential distributions, 
memory, and coarse-graining in CTRWs. 


I. INTRODUCTION 

We can model many dynamical systems in physics, 
chemistry, and biology as random walks on discrete state 
spaces or network structures. For example, random walks 
can represent proteins folding on a coarse-grained net¬ 
work of conformational states 1,2 , particles diffusing in 
disordered, fractal-like media 3,4 , populations evolving in 
DNA or protein sequence space 5,6 , and cells differentiat¬ 
ing across epigenetic landscapes of regulatory states 7,8 . 
One can also use random walks to probe the structure 
of empirical complex networks, such as protein-protein 
interaction networks or the World Wide Web 9-11 . The 
central problem in these models is characterizing the sta¬ 
tistical properties of paths taken by the system as it 
evolves from one state to another, especially for systems 
out of equilibrium. This entails understanding not only 
the distribution of lengths and times for these paths, but 
also their distribution in the state space, which may re¬ 
veal bottlenecks and indicate the diversity of intermedi¬ 
ate pathways. 

There is extensive literature for random walks on lat¬ 
tices 4,12,13 , fractals 3,4 , and random and scale-free net¬ 
works 9,11,14 in the absence of an energy landscape or 
other objective function. Much of this work has focused 
especially on the scaling behavior of first-passage times 
and the mean square displacement, the latter being im- 
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portant to identifying anomalous diffusion 3 . More com¬ 
plex models, especially those with energy landscapes de¬ 
rived from empirical models or experimental data, gener¬ 
ally require numerical approaches. One such approach is 
transition path theory 15 , which relies on numerical solu¬ 
tions of the backward equation. This technique has been 
used to study Markov state models of molecular kinetics 
such as protein folding 1,2 . 

Standard transition path theory, however, is not 
applicable to general continuous-time random walks 13 
(CTRWs) where states may have non-exponential wait¬ 
ing time distributions, nor does it address the com¬ 
plete distribution of first-passage times beyond the mean. 
These problems are important in many systems. For ex¬ 
ample, molecular Markov state models require grouping 
large numbers of microscopic conformations of molecules 
into a small number of effective states 16 ; the stochastic 
dynamics are then analyzed on this effective model 1,2 . 
However, this coarse-graining is known to lead to qualita¬ 
tive differences with the underlying microscopic dynam¬ 
ics 16 . In particular, the loss of information due to coarse- 
graining can lead to the appearance of memory, mani¬ 
fested as non-exponential waiting time distributions, in 
the coarse-grained states. Indeed, there is evidence of 
non-exponential distributions of time in protein confor¬ 
mation dynamics 17,18 and enzyme kinetics 19,20 . Non¬ 
exponential distributions can also arise from spatial dis¬ 
order, as in glassy systems 21,22 . Other linear algebra- 
based methods have been developed to treat general 
CTRWs 23-25 , but such methods are complicated and pro¬ 
vide relatively little physical insight. 

An alternative, more intuitive approach to CTRWs 
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uses the path representation: statistical properties of 
CTRWs are decomposed into averages over the ensem¬ 
ble of all possible stochastic paths through state space. 
Some analytical results with this approach for arbitrary 
energy landscapes and waiting time distributions have 
been obtained, but only for ID lattices, due to the dif¬ 
ficulty of enumerating paths 26-28 . On the other hand, 
path sampling methods 29 are able to treat arbitrary net¬ 
work topologies, but these methods have not been devel¬ 
oped for non-exponential waiting time distributions, and 
in any case sampling is likely to be inefficient for calculat¬ 
ing higher moments of path statistics, which are crucial 
when non-exponential distributions are expected. 

Here we develop a generalized formalism for the path 
ensemble of a CTRW on a network of discrete states, 
regardless of their connectivity, energy landscape, or in¬ 
termediate state waiting time distributions. In Sec. II 
we use statistical mechanics of the nonequilibrium path 
ensemble for a CTRW to obtain expressions for arbi¬ 
trary moments of path statistics including path length, 
time, action, and any conservative or nonconservative 
force along a path. We use this formalism to de¬ 
duce general relationships among the distributions of 
path time, length, and action, as well as several ex¬ 
act relationships for the case of homogeneous networks. 
In Sec. Ill we derive recursion relations, reminiscent 
of transfer matrix and exact enumeration techniques, 
to efficiently calculate various path statistics numeri¬ 
cally, including distributions of paths in the state space. 
We have implemented our approach in a user-friendly 
Python script called PathMAN (Path Matrix Algorithm 
for Networks), freely available at https: //github. com/ 
michaelmanhart/pathman, that users can apply to their 
own models. 

In Sec. IV we demonstrate the numerical algorithm on 
a few examples. After illustrating some basic concepts 
on a simple ID random walk, we apply our method to 
a ID comb to show how coarse-graining can lead to the 
appearance of memory, in the form of non-exponential 
waiting time distributions. We quantify the effect of the 
memory on the distribution of total path times. We fur¬ 
ther demonstrate the effect of coarse-graining in a 2D 
double-well potential, from which we deduce some gen¬ 
eral properties of memory arising from coarse-graining. 
Lastly, we use our method to show how spatial disorder 
can also lead to non-exponential distributions of path 
statistics in the 2D random barrier model. 


II. DISTRIBUTIONS IN THE PATH ENSEMBLE 
A. Continuous-time random walks and memory 

Consider a stochastic process on a finite set S of N 
states: the process makes discrete jumps between states 
with continuous-time waiting at each state in between 
jumps. Such a process is known as a continuous-time 
random walk 13 (CTRW), and it can describe many phys¬ 


ical or biological systems, such as a protein traversing 
a coarse-grained network of conformations toward its 
folded state 1,2 or a particle traveling through a disor¬ 
dered material 3,4 . The time the system waits in a state 
a before making a jump to cr' is distributed according to 
ip{t\a — > cr'). In many models this distribution depends 
only on the current state a and not on the destination cr', 
so that ip(t\a —t a') = ^(t|cr); such waiting time distribu¬ 
tions are known as “separable” 30 . We will mostly assume 
separable distributions throughout this paper. However, 
since non-separable distributions arise crucially in coarse¬ 
grained models, we will also briefly discuss how to extend 
our results to the non-separable case. Let the raw mo¬ 
ments of the waiting time distributions be denoted as 

pOO 

0(")(<r)= / dt ip(t\a) t n . (1) 

Jo 

We assume every state a has at least one finite moment 
for n > 0; the zeroth moment is always 9^ (a) = 1 
by normalization. In the special case of a discrete time 
process, ip(t |cr) = S(t — ^^(cr)) and the moments are 
9^(a) = {9^(a)) n . 

Given the system has finished waiting in cr and makes 
a jump out, the probability of jumping to cr' is given by 
the matrix element (cr' | Q | cr), where Q is an N x N ma¬ 
trix and | cr) denotes an iV-dimensional vector with 1 at 
the position corresponding to the state cr and 0 every¬ 
where else. The jump probabilities out of each state cr 
are therefore normalized according to X] (T '( cr, |Ql tT ) = 1> 
with (er|Q|er) = 0 by definition (since a jump must leave 
the current state). The matrix Q imposes a network 
structure over the states in S, with edges directed and 
weighted by the entries in Q. We can think of the jump 
process alone as a discrete-time projection of the model, 
since it describes the system’s dynamics if we integrate 
out the continuous waiting times. 

An ordinary Markov process is a special case of the 
above CTRW construction. A continuous-time Markov 
process is typically defined by a rate matrix W such that 
in a small time interval At, the probability of making a 
jump cr —> a' is (cr'|W|cr)At. Therefore the probability 
of making the jump cr — > cr', given that the system makes 
any jump out of cr during At, is 


(cr , |W|c7)At 

]T T „(cr"|W|cr)At 


(cr'|W|cr) 
YJa" (<7"|W|<7) 


(cr'|Q|cr), (2) 


which defines the relation between the Markov rate ma¬ 
trix W and the jump matrix Q. The probability per 
unit At of waiting time t = MAt in cr and then making 
a jump out is given by 
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The waiting time distribution ip{t\a) is then the contin¬ 
uous limit of Eq. 3: 




t/At 


0(!)(ct) 


o-t/6 W (a) 


where 


( 4 ) 


-1 


0 {1) (<r)= £>'|W|<7> 


( 5 ) 


is the mean waiting time in a. Hence waiting times 
in a Markov process always have an exponential dis¬ 
tribution. The higher moments of exponential waiting 
times are completely determined by the mean: 9^ (a) = 

n\(eW(a)) n . 

In general, processes with exponential distributions of 
times p(t) are important because they are memoryless in 
the following sense: the probability of taking at least time 
t, given the process has already taken at least time t 0 , 
is the same as taking at least t in the first place. That 
is, the system “forgets” the time it has already taken. 
Mathematically this means that 


P(t +1 o) 

P(to) 


P(t ), 


( 6 ) 


where P(t) = f t °° (it,' p(t') is the complementary cumu¬ 
lative distribution function. The only function satisfying 
Eq. 6 is a simple exponential P(t) = e -t / r , from which it 
follows that p(t) = r~ 1 e~ t P. For waiting time distribu¬ 
tions ip(t\cr), non-exponential functions are therefore in¬ 
dicative of memory within a state: how much longer the 
system tends to wait in that state depends on how long 
it has already waited. In contrast to ordinary Markov 
models where ij){t\a) is always exponential, models where 
ij){t\(j) may be non-exponential are sometimes known as 
semi-Markov processes. 


B. The ensemble of first-passage paths 


Let Sfinai be the set of final states, which we will treat 
as absorbing ((cr'|Q|(j) = 0 for all a £ S'flnai and a' £ S) 
so that the first-passage condition is satisfied. Define a 
path p of length i to be an ordered sequence of i + 1 
states: <p = {cr 0 , cri,..., aej. Denote the probability dis¬ 
tribution over initial states as 7To(cr). Then the probabil¬ 
ity density of starting in a state cr 0 and completing the 
path (p at exactly time t is given by 


p[(fi,t] =7r 0 (cr 0 ) |^(cr,; + i|Q|cr i )j 

■ /»oo /»oo 

x / dt, 0 1 p(t 0 \<T Q ) / dti ^(tilui) 
Jo Jo 

nOO / t—\ 

J die- 1 d ( t - y^ t 


( 7 ) 


where to,t±,..., tg-i are the intermediate waiting times 
and 6 is the Dirac delta function. The probability of 
completing the path (p irrespective of how much time it 
takes is then 


v[<p\= dt v[tp,t\ = 7r 0 (cr 0 ) TT(0i+i|Q|0i). (8) 

J 0 2—0 

The time-independent path probability V[p\ is conve¬ 
nient because we can express many path statistics of in¬ 
terest as averages over this distribution, analogous to av¬ 
erages over the Boltzmann distribution in ordinary statis¬ 
tical mechanics 29,32,33 . For example, let P[p] be a func¬ 
tional that measures some property of the path <p. We 
use angle brackets to denote the average of this quantity 
over the path ensemble: 

{p) = Y J p m<p\, ( 9 ) 

where the sum is over all first-passage paths ip of any 
length ending at states in <Sfi na i. Note that the partition 
function of the first-passage path ensemble, V\p\, al¬ 
ways equals 1, since the process must reach one of the 
final states eventually. In this manner we can calculate 
nonequilibrium (first-passage) properties of the system 
as equilibrium properties of the path ensemble, which is 
time-independent by construction. 


We approach CTRWs using the ensemble of first- 
passage paths 26-29,31-33 , which first reach a particular 
final state or a set of final states from some initial con¬ 
ditions. We are interested in statistical properties of 
this ensemble such as its distributions in length, time, 
and space. In addition to situations where first-passage 
properties themselves are of interest, first-passage paths 
constitute fundamental building blocks of a stochastic 
process since the full propagator and steady state can in 
principle be derived from them 4 . 


C. Distribution of path lengths 

The simplest path property is its length C[ip\, i.e., the 
discrete number of jumps along the path. The mean path 
length is then 

<£} = 5 >M/;m. (io) 

V 5 
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Functionals for the higher moments of path length are 
simply powers of the length functional, 


<£"} = £ pm mr, (ii) 

and the path length probability distribution is 


p (0 = Qi,c )> ( 12 ) 


r w M = 


rOO rOO 

/ dt 0 ip(t 0 \a 0 ) / dtt ■■■ 

Jo Jo 

roc n -1 \ n 

/ dtt -1 V’(^-il^-i) V'ti 

/ (16) 

= E 


30i3ii'”t3£-i 

x 6^°\ao)0^ 1 \ai) ■ ■ ■ 9^ t ~ 1 \ae-i). 


where <5 is the Kronecker delta. Note that the distribu¬ 
tion of path lengths depends only on the jump matrix 
Q and not on the waiting time distributions ip(t\cr), and 
hence it can be thought to characterize the discrete-time 
projection of the underlying continuous-time stochastic 
process. In Appendix A we show that the distribution of 
path lengths p(£) is typically exponential asymptotically: 

P (£) ~ (13) 

where a is a constant of order 1 and i = (C) is the mean 
path length. 


D. Distribution of path times 

In contrast to the discrete length of a path, there is also 
the continuous time of the path that accounts for the 
variable waiting times at the intermediate states. The 
distribution of total path times (first-passage time distri¬ 
bution) is 


Each summation in the multinomial expansion is from 0 
to n subject to the constraint jo + ji + ■ ■ ■ + jt-i = n. 
For example, the first few moments are 

T ( 1 ) M] =yy i) (er i ), 

2—0 

T (2) M =E 0{2) (^) + 2 E 0(1) (^ (1) (^)’ 

2—0 i<j 

t (3) m=E* (3) (^) 

2—0 

+ 3 E + 6 |(2) (f7 i )6 ,(1) (c7 j )) 

i<3 

+ 6 E 0 ( ' 1) ( a i)0 {1 \ a i)0 W ( a k)- 

i<j<k 

(17) 

Note that Eq. 16 implies that if any accessible intermedi¬ 
ate state has a divergent waiting time moment of order n , 
then all path time moments of order n and higher must 
be divergent as well. 


f (0 = J 2 V ^’ t \- ( 14 ) 

v 

Unlike the path length distribution, the path time distri¬ 
bution depends on both the jump matrix Q as well as the 
waiting time distributions i/j(t\a). We cannot evaluate 
/(£) for arbitrary waiting time distributions ^>(£|<r); how¬ 
ever, we can express its moments as simple averages over 
the time-independent path ensemble using path function¬ 
als (cf. Eq. 9). That is, using Eqs. 7 and 14, we obtain 


E. Path action and a general class of path functionals 

For many systems it is important to determine whether 
their dynamics are highly predictable or highly stochas¬ 
tic; that is, whether the system is likely to take one of a 
few high-probability paths every time, or whether there 
is a more diverse ensemble of paths with similar proba¬ 
bilities. One way to quantify this notion uses the path 
action, defined as 


r OO r OO 

/ dtf{t)t n = / dt t n ’y^T[p, t] 

Jo Jo ^ 

= E^M'T (n) M (15) 

<p 

= (r w ), 

where the functional for the nth moment of path time is 


t-i 

<%] = -E log ( CTi+i IQI cri }’ ( 18 ) 

2—0 

so that path probability is V[<p\ = 7To((7o)e -S M As 
in classical and quantum mechanics, paths with min¬ 
imum action dominate while paths of large action are 
suppressed. Note that like path lengths, action depends 
only on the jump probabilities and not on the waiting 
time distributions. 
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The mean path action is the Shannon entropy of 
the path distribution 34 (we ignore the path-independent 
log7To(<ro) contribution from the initial condition): 

(<S) = -5>M l°gV[y\. (19) 

<p 

This is consistent with the idea that the path action dis¬ 
tribution tells us about the diversity of paths in the en¬ 
semble: low entropy (small mean action) means that a 
few paths with large probability dominate the process, 
while large entropy (large mean action) means that a di¬ 
verse collection of low-probability paths contribute. The 
distribution of actions around this mean may be non¬ 
trivial, however. For instance, even if the mean action is 
large, the variance around it could either be small (the 
system must traverse one of the low-probability paths) or 
large (the system may traverse paths with a wide range of 
probabilities). We can characterize the action distribu¬ 
tion by considering its higher moments. The functional 
for the nth moment of path action is 


.Jo, Ji, ■ ■ ■ ,je-i 


( 20 ) 


(<%])" = -E 10 ®^ 1 ^ 0 ^ 

V i=C 

= E 

JJ(—l°g( cr i+l|Q| f 7 i)) ,7i , 

2=0 

so the total moments of the path action distribution are 


i-l 
X 

2=0 


<<H=5>M(«f. (21) 


evolutionary theory 35 and biochemical networks 36 . How¬ 
ever, even for conservative forces, the distribution of the 
line integral li[ip\ may be non-trivial over the path en¬ 
semble if there are multiple initial and final states. The 
moment functionals for any such quantity are again of 
the multinomial form: 


(mr = 

E 

jo 


,3 o 1 3 15 


,3i -1 


t-i 


](t/( ( 7 J+1 ,a l )) Ji . (23) 


i —0 


This suggests that methods for calculating time or action 
moments can be applied to any path property in this 
general class. 


F. Path statistics on a homogeneous network 

We now consider these statistics of path length, time, 
and action in the simple case of a network with homoge¬ 
neous properties. We first assume that the waiting time 
distributions ip(t\a) = ip(t) are identical for all states, 
with raw moments 9^ and cumulant moments 0^ ■ We 
do not assume anything about the jump matrix Q (i.e., 
the network connectivity). In Appendix B we derive an 
exact relation between path time and length moments for 
arbitrary ip(t): 


n 

( T(n) ) = E ( ^ ) B ( e cK 0 c 2) . ' ■ • , ^”- fc+1) ) , (24) 

fe=l 

where B n ^ are the partial Bell polynomials 37 . For ex¬ 
ample, the first few moments are 


The action functionals (Eq. 20) share a similar multi¬ 
nomial form with the time functionals (Eq. 16). This 
leads us to consider a more general class of path func¬ 
tionals with this form. Consider a path functional IA 
that sums some property over jumps in a path (or edges 
in the network), so that for a path ip of length £, 

l-1 

%]=^% +1 ,a,). (22) 

i=0 

In the case of action, [/(cq+i,cq) = — log(crj+i|Q|oi). 
Equation 22 is a discretized line integral along the path 
p, which suggests thinking of U as representing a force 
acting on the random walker as it traverses a path. The 
statistics of such forces over paths are especially inter¬ 
esting when the force is nonconservative, i.e., the line 
integral U\p\ depends on the whole path ip and not just 
on the end points. Non-transitive landscapes or non- 
gradient forces with this property have been considered in 


(T (1) }=0« <£>, 

(t (2) ) = (<?W) 2 (£ 2 ) + 0^ <£>, 

(V (3) J> = (^ 1} ) 3 <£ 3 } + 30 ^ 0 ™ (£ 2 ) + 0® (£). 

(25) 

Note that the nth time moment depends on all length 
moments up to n. Equation 24 holds for the cumulants 
<r(")) c and (£") c as well (Appendix B). In Appendix C 
we present an alternative argument for the first two mo¬ 
ments of Eq. 25 using the central limit theorem, and in 
Appendix D we study the special case when ip(t) is ex¬ 
ponential. 

As these results show, we can think of path time as 
a convolution between path length and the intermediate 
waiting times: the variation in total path times arises 
from both variation in path lengths as well as varia¬ 
tion in the waiting times. If ip(t) is a delta function 
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(discrete-time process), then 9 ^ = 0 for n > 1 (no 
variation in waiting times), and (T^) = (9^) n (C n ) 
exactly: path lengths and times are identical up to an 
overall scale. This is consistent with our previous no¬ 
tion that the path length distribution fully describes the 
discrete-time projection of the process. However, even 
with continuous-time distributions i/;(t), the approxima¬ 
tion (T (n) ) ~ (6?W)" {£"), and therefore the approx¬ 
imate equivalence of the discrete- and continuous-time 
processes, may still hold if the waiting times are not too 
broadly dispersed (so the higher moments of ip(t) are not 
too large). We can make this observation more quanti¬ 
tative by expanding Eq. 24 as 

(r< n) ) = (o^y (C n ) 



(26) 


where 


we expect this to hold, since ~ 1 for exponential- 
like waiting time distributions and the mean path length 
£ is usually very large. We will investigate the validity of 
this condition in later examples. 

We also consider path action on a homogeneous net¬ 
work. Path action depends only on the jump matrix 
Q and not on the waiting time distributions ip(t\a), so 
as a simple example we take all states in S to have 
the same number 7 of outgoing jumps (nearest neigh¬ 
bors on the network) and all such jumps to have equal 
probability 7 ” . Therefore the probability of a path is 
V[ip\ = 7 _£ ^], and the action is <S[y>] = £[</>] logy. This 
means that the distribution of path actions is exactly 
equivalent to that of path lengths (rescaled by a factor 
of logy), and the moments are 

(S n ) = (£") log" 7 . (30) 

Since path lengths typically have an exponential distri¬ 
bution asymptotically (Eq. 13, Appendix A), path action 
will therefore also be asymptotically exponential as well, 
with mean ^logy. 


III. MATRIX FORMULATION AND NUMERICAL 
ALGORITHM 


0(cv) 



(27) 


is the waiting time coefficient of variation (CV), i.e., the 
standard deviation divided by the mean. The CV mea¬ 
sures the relative dispersion of a distribution; it always 
equals 1 for exponential distributions. As with Eq. 24, 
Eq. 26 holds for the cumulants (£ ra ) c and (T^) c as well. 

Equation 26 implies that path length and time mo¬ 
ments will be approximately proportional, and hence the 
whole distributions should be similar, if 


/ \ 2 /r n ~ 

( 0(cv) ) V) <<c L (28) 

The quantity (£ n_1 ) / (£") is typically of the order of the 
inverse mean path length (£) = £; in particular this is 
true when path lengths have an exponential distribution, 
which is generally the case asymptotically (Appendix A). 
An important exception, however, is if lengths have a 
Poisson distribution, so that (£” _1 ) c / (£ n ) c = 1 in the 
cumulant version of Eq. 26. Apart from this special case, 
the condition of Eq. 28 is equivalent to 

(V cv )) 2 < £, (29) 

that is, the waiting time distribution must be sufficiently 
narrow compared to the mean path length. In many cases 


Besides gaining general insights into the relationships 
between distributions of path lengths, times, and actions, 
the path ensemble formalism is convenient because we 
can efficiently calculate many ensemble averages using 
recursion relations that implicitly sum over all paths. We 
now derive these relations and show how to implement 
them numerically. 


A. Recursion relations 

We reformulate the problem in terms of matrices to 

(n) 

express the sums over paths more explicitly. Let T) ' 
be an TV x TV matrix (TV is the number of states in S) 
such that the matrix element |cr) is the nth time 

moment of all paths of exactly length £ from o to a'. In 
particular, the zeroth-order matrix gives the total 
probability of all paths going from cr to a' in exactly £ 
jumps. The initial condition is 


T( n) = a ni 0 i, (31) 

where 1 is an TV x TV identity matrix. If our path ensemble 
is the set of first-passage paths to final states Sfi na i with 
an initial distribution vector |7To) = 7To(er)|cr), then 


£ MT^Itto) = (32) 

cGSfinal V 3 










is the nth time moment for all paths of exactly length t, 
and 
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E E H T / n) Ko> = (r (n) ) (33) 

t =0 O-eSfinai 

is the moment averaged over paths of all lengths. This 
expression illustrates how to express the previous path 
ensemble averages in the matrix formulation. 

( U ) 

The key advantage of the T), matrices is that they 
obey the following recursion relation (Appendix E): 

t£° = Q E (") © (j) tE J) , (34) 

3=0 W' 

where ©(") is an TV x TV matrix with waiting time mo¬ 
ments for each state along the diagonal: 


(ct'IQEct) = {cr'IQIcr) (— log<cr'|Q|cr)) J . (38) 

In fact, if uE is the matrix such that 

OO 

E E MUf |7ro} = <^> (39) 

t=0 O-CSfinal 

for any path functional IA in Eq. 22, it obeys the recursion 
relation 



where (<r'|ri(T)| cr ) = (ct / |Q|ct)(C/(ct' ; ct))- 7 (Appendix E). 
Therefore recursion relations of this form extend to a 
wide class of path statistics. 


(a'\&^\a)=6 a -^ n \a). (35) 

Appendix E also shows how this recursion relation for 
the path time moments generalizes to the case of non- 
separable waiting time distributions —>■ a ')- For 

the total probability (n = 0), the recursion relation 
of Eq. 34 is simply multiplication by the jump matrix: 
E 0) = QTfJj, since the total probability of going from 
one state to another in exactly £ jumps must be given by 
the product of the jump matrices tJ°* = Q^. 

Owing to the similar multinomial form of their path 
functionals (compare Eqs. 16 and 20), the path ac¬ 
tion moments obey a similar recursion relation. Define 
(ct , |sE|<t) to be the nth action moment of all paths of 
length £ from a to a', so that 


E E msEm = <£">. (36) 

t=0 O-SSfinal 


In Appendix E we show that these matrices obey the 
recursion relation 





Q(j) s («- j) 


(37) 


B. Transfer matrices 


To calculate the nth moment of time or action, we 
must carry out the recursion relation of Eq. 34 or 37 for 
all moments up to n. We can unify all these steps into a 
single transfer matrix operation convenient for numerical 
use. Let n max be the maximum moment of interest. De¬ 
fine the TV(n max + l)-dimensional column vector | t(£)) as 
a concatenation of |tto) for all n € {0,1 ,..., n max }: 


W)) 


Tfko) 

tEm 


(41) 


Define the basis vectors |<r, n) for a £ S and n £ 
{0,1 ,..., n max } so that the (a, n) entry of \t{£)) is 
the nth time moment at state cr at the ^th jump: 
(a, n\r{£)) = (cr|T^”Eo)- We similarly define the action 
vector 


i m 


sfVo) 

S^Vo) 

S^ max) |7T< 


(42) 


where the matrix Q*T) is defined so that 


Now define the TV(n max + 1) x TV(n max + 1) matrices 






(o)Q® 0 
(!)Q0 (1) 
©Q© (2) 

0 

0 

0 

(n)Q 0(O) 

(i)Q® (1) 

0 

0 

(o)Q 0(o) 

0 

max') Q Q (Hma:: 
max' 

) ( ^max 'j CD @ (^max 

'71 max 1 ' 

— 1) ( 71 max \ Q Q (n max 

'71 max 2/ 

-2) ... ( n ”“)Q0( 0 ) 

O9 (0) 

(?)Q {1) 

0 

0 

0 


Q)Q (0) 

0 

0 


©Q (2) 

(i)Q (1) 

(o)Q (0) 

0 

5 

ma x] Q(llmax) 

max' 

( 71 max \ Q(71 max — 1) 

'Tlmax 1 ' 

/ 7l max ^ 0( n max — 2) 
VtWx-2^ 

(llmax)Q(O) 



where each 0 is an N x TV zero matrix. We can ex¬ 
press the recursion relations of Eqs. 34 and 37 for all 
ti £ {0,1 ,..., n max } as 


|t(*)>=K|t(*-1)>, m)=G\ri(£-l)). (44) 

These recursion relations have the solutions 


|r(*))=KV(0)>, m)=G t \ V (0)), (45) 

where the initial conditions are 


HO)} = H0)> 


H) 

o 

o 


(46) 


Here each 0 represents a zero column vector of length 
N. We can think of K and G as transfer matrices that 
iteratively generate sums over the path ensemble to cal¬ 
culate moments. This is analogous to transfer matrices 
in spin systems that generate the sums over spin configu¬ 
rations to calculate the partition function 38 . The zeroth- 
order version of this formalism, which simply calculates 
powers of the jump matrix Q, is equivalent to the exact 
enumeration method for discrete-time random walks 3,39 . 

We can obtain most path statistics of interest by var¬ 
ious matrix and inner products on these vectors. Define 
the cumulative moment vectors 


(a,n\r) = ^(ojT© Ko), 

( 4 8) 

W,n\v) = EHS^ko). 

t=o 

These represent the total nth moments of time and action 
for all paths through each state cr, but weighed by the 
number of visits to that state since the sum over £ counts 
a path’s contribution each time it visits cr. In the case of 
n = 0, 


H,0|t) = <cr, 0 |t7> = ^((x|QVo>, (49) 

1=0 

since T© = S© = Q £ (Eqs. 34 and 37). This is actually 
the average number of visits v(a) to a state cr, since the 
probability of a path is counted each time it visits cr. For 
an intermediate state cr, multiplying the mean number 
of visits v(a) by the mean waiting time 9^\a) gives the 
average time spent in cr. When a is a final state in Sfi na i, 
the random walk can only visit it once (if it absorbs at 
that final state) or zero times (if it absorbs at a differ¬ 
ent final state), and thus the average number of visits 
v(cr) equals the probability of reaching that final state cr 
(commitment probability). 

If there are multiple final states in Sfi na i, we often wish 
to sum path statistics over all of them. Define the N- 
dimensional row vector (final| = XHesy jHI ( w ith 1 at 
the position for each final state and 0 everywhere else) 
and the (n max + 1) x _/V(n max + 1) matrix 


H = l r W)> H 


t=0 


t =0 


(47) 


(final| 0 • ■ ■ 0 

0 (final| • • • 0 


0 • • • (final| 


( 50 ) 


Elements of these vectors are (using Eqs. 41 and 42) 


0 
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where each 0 is a zero row vector of length N. Multiply¬ 
ing this matrix on a corresponding vector will sum over 
all final states for each moment, leaving an (n max + 1)- 
dimensional vector with the total moments. For example, 


F|r(£)> 


(final|Tj, 0) |7r 0 ) 

(finallT^Vo) 

(final|T^ max) |7ro) 


t( n max)^) 


= i m, 


(51) 


where we use the shorthand t^ {£) for the total nth time 
moment absorbed at the £th jump. Note that (£) = 
p{£) is the probability of reaching any of the final states in 
exactly £ jumps. Thus this method allows us to calculate 
the entire path length distribution. On the cumulative 
time vector |r), F returns the total time moments: 



XXoMTf Itto) 


r f(o) 1 

F|t) = 




— 



_ EfcoMT^“Vo) _ 


l(^max) 


where CO = ) is total time moment 

over all paths. The matrix F similarly acts on the action 
vectors \r}{£)) and I 77 ): 


F\ V (£)) 


Fir?) 


S (°){£) 
s^(£) 

g(n.max) (£) 
c( 0 ) 


g(^max) 


= \m> 


(53) 


where s^ n -*(£) is the nth action moment absorbed in all 
final states at the £th jump, and s^ is the total nth 
action moment. 

Finally, for any function of state B{c r), we can calculate 
the average value of that function at the f?th intermediate 
jump. Define the A^(n max + 1 (-dimensional row vector 


(B\= [(E,B(a)(a\) 0 ••• 0], (54) 

where there are n max zero row vectors 0 , each of length 
N. The row vector (B\ acts on \t{£)) to return the value 


of B(cr) averaged over the probability distribution across 
all intermediate states at the Mi jump: 


(B\t{£)) = ]T B(a)(a\T^ |tt 0 ) = B(£)- (55) 

G 

For example, if B(a) = 6^{cr), B(£) tells us the uncondi¬ 
tional mean time spent at the £th. intermediate jump. If 
B(a) is set to a position in space corresponding to state a 
(for systems that allow embedding of states into physical 
space), B{£) is the average position at the Mi intermedi¬ 
ate jump, which over all £ traces the average path of the 
system. 


C. Convergence and asymptotic behavior of path sums 

To numerically calculate the foregoing matrix quan¬ 
tities, we must truncate the sums over path lengths £ 
at some suitable cutoff A. If there are no loops in the 
network, then the jump matrix Q is nilpotent, meaning 
there is a maximum possible path length A such that 
Q* = 0 for all £ > A. In this case all sums converge 
exactly after A jumps. If the network has loops, paths 
of arbitrarily long length have nonzero probability. We 
must then choose a desired precision eCl and truncate 
the sums at £ = A when 


A 

1 — ^2 pCO < 6 an d 

t=o 


f(rim ax) (A) 

Ef=o i ( ” max) W 


(56) 


The first condition guarantees that the total probability 
has converged: all remaining paths have total probability 
less than e. The second condition indicates that the Ath 
contribution to the maximum moment n max is sufficiently 
small relative to the total moment calculated so far. 

A potential problem with the second convergence con¬ 
dition arises when the state space is periodic, so the fi¬ 
nal states can only be reached in a number of jumps £ 
that is an integer multiple of the periodicity (plus a con¬ 
stant). For instance, square lattices have a periodicity 
of 2. In that case, ? ramax ^(^) will alternate between zero 
and nonzero values as £ alternates between even and odd 
values. To prevent these zero values of t(" max )(f!) from 
trivially satisfying the second condition in Eq. 56, we also 
require that ?” max )(A) be nonzero. A more subtle prob¬ 
lem can arise if there are very low-probability paths with 
very large contributions to the higher time moments. For 
example, one can construct a model where there are ex¬ 
tremely long paths with probabilities much smaller than 
e but which make arbitrarily large contributions to the 
total time moments due to the waiting time moments at 
those states. The algorithm will satisfy the convergence 
criteria before these paths are summed and therefore miss 
their contributions. This is an extreme example, but in 
general one may need to reconsider the convergence test 
depending on the properties of the model at hand. 
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How does the cutoff A depend on the maximum mo¬ 
ment n max ? To address this we must determine the 
asymptotic behavior of tS rl ^ (£) for different n. As long as 
the network has loops, the path length probability dis¬ 
tribution p(Z) is asymptotically exponential (Eq. 13; see 
Appendix A). To estimate the asymptotic dependence 
of the higher time moments on path length, we consider 
the special case of identical waiting time distributions as 
in Sec. IIF. Since 


^\e) = J2' p [‘p}r {n) m,c M , ( 57 ) 

we can use the approximation T^[<^] ~ (C[<p]) n 

from Eq. 26 (valid when the waiting time distributions 
are not too dispersed) to obtain 


m) n 6 e ,c M 

= (d w ey P {£) ( 58 ) 

~ (e w e) n e~ ae,i . 


Since re~ alL l l = e - Q ^+ nlo g^ the higher moments de¬ 
cay nearly exponentially (up to a logarithmic correction) 
with the same rate as the probability, set by the mean 
path length i. We expect this asymptotic behavior to re¬ 
main valid even when the waiting time distributions are 
not all the same, as long as the length and time moments 
are approximately proportional; we will empirically ver¬ 
ify this expectation in later examples. 

Although all t^ n \Z) asymptotically decay with expo¬ 
nential dependence on Z, the logarithmic correction in 
the exponent shifts the exponential regime toward larger 
t for higher moments; this is why we test convergence on 
the maximum moment n max hr Eq. 56. Indeed, t^ n \Z) 
in Eq. 58 is maximized by ^ max = nZ/a , after which ex¬ 
ponential decay sets in. Thus we expect scaling for the 
cutoff to be A ~ n max £ to leading order. 

The approximate exponential dependence of the mo¬ 
ments also enables a convergence scheme more sophisti¬ 
cated than Eq. 56. Since we know the asymptotic depen¬ 
dence of all moments will be approximately exponential 
for long paths, we can simply calculate the moments for 
path lengths until all T”)(£) have reached their expo¬ 
nential tails, and then fit exponential functions and ex¬ 
trapolate to infer the contributions of the longer paths. 
Conceptually, this means that all long path behavior is 
contained in the statistics of shorter paths, since long 
paths are simply short paths with many loops 31,32 . In 
practice this procedure can help to avoid calculating ex¬ 
tremely long paths unnecessarily. 


Define initial distribution |zro) and set of final states 5fi na i 
Define jump and waiting time matrices Q, ©O (Eq. 35), Q 
(Eq. 38) 

Define transfer matrices K and G (Eq. 43) 

Define final state sum matrix F (Eq. 50) 

Define row vector (B | (Eq. 54) for each state function B(a) 
Define initial transfer vectors |r(0)) = 1 97 ( 0 )) = (|tto), 0,0,..., 0) T 
(Eq. 46) 

Define initial cumulative vectors |r) = |r(0)), 1 77 ) = r;(0)) 

For l G {1,2,3,...}: 

Update \t(Z)) = K|r(f — 1)) 

Update |— 1)) 

Increment |r) by |r(tj) 

Increment \rj) by | rj(t)) 

Set | i(e)) = F|r(<)> 

Set B(Z) = (S|r(f)) 

If 1-Eko P(n<e: 

if A n max 1(f) >0 and A™ m ax)(f)/ Y%i =0 *("”“)(£') < e: 

Break 

Output distributions |r) and \rj) of cumulative moments over 
states 

Output distributions \t(£)) and B(i) over path lengths 


FIG. 1. Pseudocode for the matrix calculations implemented 
in PathMAN. 


D. Numerical implementation in PathMAN 

We have implemented the aforementioned matrix for¬ 
mulation in a Python script called PathMAN (Path 
Matrix Algorithm for Networks), available at https:// 
github.com/michaelmanhart/pathman, with additional 
scripts for generating examples and analyzing output. 
Figure 1 shows the pseudocode. Since the jump matrix 
Q is typically very sparse, we can store all matrices in 
sparse formats for efficient storage and computation us¬ 
ing SciPy’s sparse linear algebra module 40 . The script 
is general enough to treat any CTRW on a finite dis¬ 
crete space given a list of states, their jump probabili¬ 
ties, and at least their first waiting time moments. The 
current implementation assumes separable waiting time 
distributions, but modifying it to run the calculations for 
non-separable distributions (Appendix E) is straightfor¬ 
ward. The user can specify any path boundary condi¬ 
tions (initial distribution and final states) and functions 
of state B(a) to average over. The script reads all input 
data from plain text files in a simple format (see GitHub 
repository for documentation). 

The rate-limiting step of the algorithm is multiply¬ 
ing the transfer matrices K and G with the vectors 
| t(1 — 1 )) and \rj(Z — 1)) (Fig. 1) to obtain |r(£)) and 
| T](Z)), so we use this step to estimate the time com¬ 
plexity of the algorithm. Assume that each state has 
an average of 7 outgoing jumps, so that the jump ma- 
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trix Q has 7 N nonzero entries. Each transfer matrix has 
(n max + l)(^max + 2)/2 nonzero blocks (Eq. 43), yielding 
approximately 7 iV(n max + l)(n max + 2)/2 total nonzero 
entries in K and G. Since we multiply these matrices by 
the lV(n max + l)-dimensional state vectors at each of the 
A total jumps, the algorithm scales as 

o K^NA ). (59) 

Assuming there are loops in the model (i.e., there is no 
maximum possible path length), the cutoff A scales lin¬ 
early with the mean path length t, as well as the max¬ 
imum moment n max as argued in Sec. Ill C. For simple 
random walks, the mean path length scales as a power of 
the total number of states: 

N d„/d { for d w > d { , , 6Q . 

N for d w < df, 

where d w is the dimension of the random walk and df is 
the fractal dimension of the space 11,14 . Strictly speaking 
these scaling relations depend on the boundary condi¬ 
tions (proximity of the initial and final states) and the 
presence of an energy landscape; the scaling relations in 
Eq. 60 are a “worst-case scenario” when the landscape is 
flat and the initial and final states are very far from each 
other. Altogether this implies the algorithm will scale as 

for d w > d { , 
CH^maxA^ 2 ) for dw < df. 

Alternative recursive expressions for first-passage time 
moments on ID lattices scale as 0(N 3 ) 41 , as do general 
methods for solving the backward equation (a linear sys¬ 
tem) 42 ; our scaling will be equivalent in the extreme case 
of a fully-connected network where 7 = N — 1. 

IV. EXAMPLES 

We now illustrate the path ensemble approach on a 
series of simple examples. 

A. ID lattice 

We first consider a Markov CTRW on a ID lattice. 
Let the lattice have L sites with equal and symmetric 
transition rates between neighboring sites: 

(x + l|W|x) = 1 for 1 < x < A, 

( x — l|W|x) = 1 for 1 < x < A, (62) 

(y|W|x) =0 otherwise. 

From the rate matrix W we can obtain the jump ma¬ 
trix Q and the waiting time moments 9 using Eqs. 2 


and 5; note that the reflecting boundary conditions mean 
the “bulk” states (1 < x < L) have 0^ lk = 1/2 while the 
“edge” states (x = 1, x = A) have 6^ ge = 1 due to their 
different connectivities (numbers of outgoing jumps). We 
consider the ensemble of first-passage paths from one end 
of the lattice (x = 1) to the other (x = A). Figure 2(a) 
shows the distributions i l - n ' s (i) of path time moments 
over path lengths; the path length probability distribu¬ 
tion p(£) = t^\tj is very close to exponential except for 
small £, while the higher moments illustrate the Erlang¬ 
like function derived in Eq. 58. In particular, we confirm 
that the higher time moments decay approximately ex¬ 
ponentially for large £. Since the connectivity is nearly 
the same everywhere for large L (7 = 2 for all states ex¬ 
cept x = 1 and x = A), Eq. 30 indicates the distribution 
of path action will also be exponential with mean action 
(path entropy) « ldog2. 

We now introduce a potential energy V(x) = (A — 
x)/(L — 1) that provides a constant force down the lat¬ 
tice (Fig. 2(b), inset). If we use Metropolis transition 
rates 38 (y|W|x) = min(l, e~^^ v ^~ v < - x ^), where p is the 
inverse temperature, we obtain a biased random walk 
with forward rate of 1 and backward rate of 


(x + l|W|x) =1 for 1 < x < A, 

(x — l|W|x) = e -PK L ~ l ) for 1 < x < A, (63) 

(y|W|x) =0 otherwise. 

As we increase the inverse temperature P from zero, the 
bias becomes exponentially stronger, leading to a distri¬ 
bution of path lengths more tightly concentrated around 
the minimum length t = A — 1 (Fig. 2(b)). Since only 
a single path with probability 1 is available in the limit 
P —> 00 , the distributions of path lengths and actions be¬ 
come delta functions at zero; in particular, path entropy 
is zero because the process is completely deterministic. 

What is the distribution of path times as a function of 
pl In Fig. 2(c) we show the mean path time which 
decreases dramatically as the bias is increased through p. 
It is almost exactly proportional to the mean path length 
/A) for all P, as predicted by Eq. 26 since = 1 and 
/A) = l 1 . We also show the coefficients of variation 
(CVs) i ( cv ), t ( cv ) of the path length and time distribution, 
which measure the dispersion. For P = 0, both CVs 
are very close to 1 , suggesting that the distributions of 
path lengths and times are approximately exponential. 
However, as P becomes large the length CV drops 
to zero, since the length distribution becomes a delta 
function, but the time CV t( cv ' ) decreases to a small but 
nonzero value, indicating that the distribution becomes 
narrowly but finitely distributed around its mean. 

Besides CV, standardized moments offer a useful way 
to characterize the shape of a distribution 43 . They are 
defined as dimensionless moments of a random variable 
X shifted and rescaled to have mean 0 and standard de¬ 
viation 1 : 
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(a) 



(b) 



(c) 



(3 +1 


(d) 



FIG. 2. Distributions of path lengths and times on a ID lattice, (a) The nth time moments (£) for paths of length £, 
normalized as fractions of the total moments t , in the absence of a potential energy, (b) Path length probability distribution 
p(£) for several choices of /3 on a linear energy landscape V(x) (inset), (c) The mean path length £^ (scaled by the mean 
waiting time 9 ^ = 1/2 for bulk states), mean path time Z 1 / path length CV fl cv \ and path time CV / cv ^ as functions of f). 
(d) Skewness kurtosis t^ d , hyperskewness and hyperkurtosis t^ d of path time as functions of /3. Values of /3 in (c) 
and (d) are shifted by 1 to show f3 = 0 on a log scale. In all panels we use a lattice of length L = 1000 with transition rates 
given by Eq. 63. 


POstd = 7 


P-WD 


(x-(x)y 


iii ■ 


(64) 


Since the first and second standardized moments are 0 
and 1 by construction, the lowest non-trivial moment is 
the third moment, traditionally known as skewness since 
it measures the asymmetry of the distribution around the 
mean. The fourth standardized moment is the kurtosis; 
the fifth and sixth standardized moments are sometimes 
called the hyperskewness and hyperkurtosis. For an ex¬ 
ponential distribution, the nth standardized moment is 
In, i.e., the subfactorial or the number of derangements 
of n objects. Therefore exponential skewness, kurtosis, 
hyperskewness, and hyperkurtosis are 2, 9, 44, and 265, 
respectively. For a Gaussian distribution, the first four 
standardized moments are 0, 3, 0, and 15, respectively. 

Figure 2(d) shows the first four non-trivial [n > 3) 
standardized moments of path time on the ID lattice as 
functions of /3. For /3 = 0, the standardized moments are 
very close to their exponential values, confirming that the 
distribution of first-passage times for a simple random 


walk is very close to exponential. However, as we increase 
the rightward bias by increasing /?, the moments undergo 
a rapid transition near f3 ss 10. Note that this transition 
happens at rather low temperature (T = /3 _1 = 10” 1 ) 
compared to the total change in energy across the lat¬ 
tice, which is set to 1. For very large /3, the standard¬ 
ized time moments saturate at the Gaussian values of 0, 
3, 0, and 15. This is because a single path of minimal 
length £ = L — 1 tends to dominate at low temperatures 
(Fig. 2(b)), and thus the total path time is just the sum 
of the waiting times along the single path. By the central 
limit theorem, this sum will be approximately Gaussian 
for large L. Since 0^ ge = 0 { ^ lk = 6»^ } ge c = 6^^ = 1 
in this limit (bulk and edge states are the same since 
travel along the lattice is one-way), the mean and vari¬ 
ance of path time in this limit should be the path length 
£ = L — 1 = 999 times the mean and variance of each 
waiting time: Y 1 ') = (L — l)0[uik = and = 
(L — l)0(, u ; lkc = 999, leading to a coefficient of variation 

? cv ) = V999/999 « 0.03. This agrees with Fig. 2(c). 
This simple model is reminiscent of downhill folding in 
proteins 17 and linear biochemical pathways such as those 
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used in kinetic proofreading 44 , where non-exponential ki¬ 
netics and the transition between exponential and de¬ 
terministic (narrow Gaussian distribution) regimes have 
been previously investigated. 

Equation 26 suggests that the length and time dis¬ 
tributions should be very similar even for /3 —> oo, 
since the correction term is still small (tK cv ) = 1 and 
£ = L — 1 = 999 1). Indeed, the time distribution is a 

Gaussian narrowly distributed around its mean, whereas 
the length distribution is a delta function; the errors in 
the moments are of the order l/£ ss L~ l . However, this 
slight difference is better resolved by considering the com¬ 
plete relation between length and time moments (Eq. 24) 
with cumulants (T < ' n ' > ) c and (£") c instead of the raw mo¬ 
ments. For /3 — > oo, all (£") = 0 for n > 2, which means 
that we cannot expand Eq. 24 for the cumulants as in 
Eq. 26. Instead, the only nonzero terms in Eq. 24 yield 
the exact equation (T ^) c = 9 ^ (£} c for all n. 


B. ID comb and memory from coarse-graining 

We now turn to an example that explores the effects of 
waiting memory on distributions of path times. We con¬ 
sider the ID comb: a ID backbone of length ^backbone 
where each site has a ID tooth of length L t ooth extend¬ 
ing from it (Fig. 3(a)). Combs have traditionally repre¬ 
sented simple models of diffusion on percolation clusters 
and other fractal structures in disordered materials 4,45 ; 
more recently they have also been proposed as a model 
for cancer cell proliferation 46 . As in the previous ex¬ 
ample (Eq. 62), we use symmetric transition rates of 1 
between neighboring sites in the comb. If we are pri¬ 
marily interested in diffusion along the backbone rather 
than within the teeth, it is natural to coarse-grain each 
tooth into a single effective backbone state with some ef¬ 
fective waiting time distribution -0(f) that describes the 
time spent exploring the tooth before returning to make 
a jump along the backbone (Fig. 3(a)) 4,45 . The wait¬ 
ing times within each coarse-grained backbone state are 
therefore the first-passage times to return to the back¬ 
bone after exploring the tooth. The distribution of these 
return times, /tooth (t), has the approximate form 4 


2 

tooth’ 

2 

tooth’ 

(65) 

where r is a time scale that is 0(1) in £ t ooth- Since the 
distributions of path times and lengths are very similar 
on ID lattices in the absence of potential (cf. the /3 = 0 
limit in Fig. 2(c)), the crossover time TL 2 ooth is essen¬ 
tially the characteristic time scale to explore a ID lattice 
of length Xtooth (Eq. 60). In Fig. 3(b), we show the path 
length distribution ptooth(£) to exit the tooth, which ac¬ 
cording to Eq. 26 should be approximately the same as 
the distribution of times /tooth (t) for large Ltooth- In¬ 
deed, ptooth(^) follows the form of Eq. 65 very clearly: 


4>(t) = /tooth (t) 


t 3 / 2 for t <tL 

e -t/(rL tooth ) f or j. > 7 "Jj 


there is a power law regime of £ 3 / 2 until approximately 
£ ~ tooth 1 a ft er which there is an exponential decay. 
To estimate the tooth waiting time moments 9^ from 
ip(t) = /tooth (t), we make the approximation that the 
power-law regime dominates the moment integrals 4 : 


T r2 

tooth 

0 (n) ~J dt t~^H n ~ tooth- (66) 

In Fig. 3(c) we verify this scaling by numerically calcu¬ 
lating the moments from first-passage paths that exit the 
tooth. 

The dominant power-law regime of ip(t) means that 
its statistics are very different from those of an exponen¬ 
tial distribution. For example, the CV is 9^ ~ L 3 ^ th 
rather than ~ 1, indicating a much broader distribution 
of times compared to the exponential case. The non¬ 
exponential nature of the waiting time distribution is in¬ 
dicative of memory within an effective backbone state: 
how much longer the system waits in the state depends 
on how long it has already waited. Mathematically, this 
apparent memory arises from coarse-graining each tooth 
into a single state, which erases information about the 
position of the system within the tooth. Indeed, for the 
distribution of times in Eq. 65, the mean waiting time 
starting from t = 0 is ~ L t0 oth, since it is dominated 
by the power-law regime (Eq. 66). However, if the sys¬ 
tem waits at least time ~ £ tooth 1 the mean additional 
waiting time becomes ~ L 2 ooth , due to the exponential 
regime. In other words, if the system does not leave by 
the time ~ L 2 ooth — meaning that it has diffused far 
from the backbone — it is likely to wait much longer as 
the exponential regime of ip(t) takes over. 

One effect of this memory is that it can lead to signif¬ 
icant differences between the distributions of path times 
and path lengths along the effective backbone states. 
Equation 26 shows that the moments of path time and 
path length are approximately proportional if the wait¬ 
ing time distributions are not too broad relative to ratios 
of path length moments; that is, the correction term in 
Eq. 26 is small if (f?( cv )) 2 -C (£") / ^£” -1 ^. We estimate 
the size of this correction for the comb model, focus¬ 
ing on first-passage paths from one end of the backbone 
to the other. The waiting time CV is 9 ( cv ) ~ ^tooth 
as previously mentioned. The path length distribu¬ 
tion, meanwhile, appears to be very close to exponen¬ 
tial: Fig. 3(d) shows that its skewness and kurtosis are 
consistent with their exponential values (2, 9) for any 
backbone length. Since the mean path length should be 
(£) = £~ L backbone (Eq. 60), this implies that the higher 
moments are (£”) ~ n --^backbone- Therefore the correc¬ 
tion term in Eq. 26 is approximately 




1) 


L 


tooth 


L 


2 

backbone 


(67) 


Thus when £ t ooth < T backbone , we expect path lengths 
and times along the backbone to have similar statistics, 
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FIG. 3. Distributions of path lengths and times on a ID comb, (a) Schematic of a comb with backbone of length 
^backbone and teeth of length Ltooth- We coarse-grain the teeth into single states (orange) along the backbone with effective 
waiting time distributions ^> e dge(t) and i/wk (t). (b) Path length distribution ptooth(f) to exit a tooth of different lengths I/tooth, 
along with the power law £ -3 / 2 for comparison, (c) Mean and second moment 0jy) lk of i/wk(t) as functions of tooth length 
Ltooth; points are numerical calculations, while dashed lines show expected scaling behavior from Eq. 66. (d) Skewness ^std 
and kurtosis ), t^ d of length and time distributions for paths along the backbone as functions of ^backbone, with Ltooth = 100. 


with a pronounced difference in the opposite limit. In 
Fig. 3(d) we calculate skewness and kurtosis of path time 
moments, varying ^back bone wh ile fixing Ltooth = 100. 
Indeed, for ^backbone < V^tooth = 10, there is a large dis¬ 
crepancy between path length and time moments, while 
for ^backbone > 10 they become very close. The fact 
that the length of the teeth must be large compared to 
the square of the backbone length to have an apprecia¬ 
ble effect on the path statistics indicates that dynamics 
along the backbone, rather than within the teeth, tend 
to dominate the first-passage process. 


C. Memory in coarse-grained metastable states 

As the previous example showed, memory, in the form 
of non-exponential distributions of waiting times, nat¬ 
urally arises from coarse-graining because information 
about the microscopic states of the system is lost. This 
principle plays a crucial role in the generation of discrete 
stochastic models for protein folding and other molec¬ 
ular processes 16 . In these models, a high-dimensional 
space of “microscopic” states (e.g., protein conforma¬ 
tions) is coarse-grained into a discrete set of “macro¬ 


scopic” metastable states with some effective transition 
probabilities. The resulting coarse-grained model is more 
amenable to calculating statistical properties of protein 
dynamics over long time scales, such as the mean fold¬ 
ing time or kinetic bottlenecks 1,2 . However, the coarse- 
graining can result in qualitative differences between the 
approximate macroscopic model and the true underlying 
microscopic dynamics 16 , including non-exponential wait¬ 
ing times in the effective states that are not addressed by 
conventional transition path theory 15 . 

As a simple illustration of this phenomenon, we con¬ 
sider a 2D double-well potential 

V(x,y) = (x 2 - l) 2 + 2y 2 , (68) 

as shown in Fig. 4(a). This potential has two local min¬ 
ima at (±1, 0) with V = 0, a central barrier at (0,0) with 
V = 1, and reflecting boundaries at x,y = ±2. At low 
temperatures, the system will spend most of its time in 
the basins around the two minima. Therefore it is natu¬ 
ral to coarse-grain the “microscopic” 2D space into two 
metastable states, A and B, separated by the central en¬ 
ergy barrier (Fig. 4(a)). To characterize the statistics of 
the two-state dynamics, it is common to calculate a single 
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reaction rate (inverse of the mean time) from one state 
to another. However, using a single rate parameter im¬ 
plicitly assumes that waiting times in the coarse-grained 
states are distributed exponentially. Here we show this 
to be a poor approximation. 

When the system first transitions to B from A, it starts 
just to the right of the interface separating the two states 
(Fig. 4(a)). The effective waiting time in the coarse¬ 
grained state B is therefore the time until the system 
first returns to that interface, starting from one step off 
it. We explicitly calculate the first-passage paths for this 
microscopic process using our numerical method. We dis¬ 
cretize the space into a 2D lattice with Ax = 0.05, and we 
assume a Markov CTRW on the lattice with Metropolis 
transition rates for jumps between nearest neighbors. Al¬ 
though the system can enter state B at any point along 
the interface with A, for simplicity we assume that it 
enters through the central barrier at (0,0) (the lowest- 
energy point along the boundary) and therefore starts in 
B at (Ax,0) (marked by the blue square in Fig. 4(a)). 
For low temperature (/? = 10), in Fig. 4(b) we show the 
distributions of path lengths (blue line) starting from this 
point and returning anywhere along the interface; an ex¬ 
ample of such a path is shown in Fig. 4(a) (blue line). The 
distribution has a power-law regime for small t and ex¬ 
ponential regime for large t. These two asymptotic limits 
are the same as the distribution of waiting times in the ID 
tooth (Eq. 65, Fig. 3(b)), and indeed they have a similar 
physical basis: the power-law regime arises from paths 
that quickly return to the interface without falling into 
the low-energy basin, while the exponential regime arises 
from paths that fall into the low-energy basin before re¬ 
turning. In contrast to the ID comb, though, there is a 
broad flat region of the distribution between the power 
law and exponential regimes. This is actually part of 
the distribution of paths that fell into the low-energy 
basin before returning; it corresponds to an intermediate 
regime before that distribution hits its asymptotic ex¬ 
ponential regime (cf. path length distributions on a ID 
lattice in Fig. 2(b)). We confirm this by directly calcu¬ 
lating paths with the starting point in the basin (red line 
in Fig. 4(b); an example path is shown in Fig. 4(a) from 
the red square to the red circle). 

In Fig. 4(c) and 4(d) we demonstrate that the distri¬ 
bution of path times, i.e., the waiting times in the coarse¬ 
grained state B (or A by symmetry), is nearly identical to 
the path length distribution shown in Fig. 4(b). Besides 
the mean time t W, we also calculate the CV and stan¬ 
dardized moments for both length and time distributions, 
which are indistinguishable over the entire range of /3 
(despite the heterogeneities in waiting time distributions 
across the lattice). Hence the path length distribution in 
Fig. 4(b) also describes the effective waiting time distri¬ 
bution in the coarse-grained state. The power-law regime 
of this distribution for short paths leaves a distinct signa¬ 
ture in the moments. Even at /3 = 10, which represents 
a temperature that is 10 times smaller than the lowest 
energy barrier (such that we expect the metastable ap¬ 


proximation of A and B to be very good), the distribution 
of times deviates strongly from an exponential distribu¬ 
tion: the CV is nearly 3, while the skewness and kurtosis 
are much larger than their exponential expectations. At 
lower /3 (higher temperatures), the deviation from an ex¬ 
ponential distribution becomes even more pronounced. 
As with the comb, this enrichment of the distribution for 
very short paths means that given the system just tran¬ 
sitioned to B, it is likely to quickly transition back to A. 
But if it does not transition back quickly, it is likely to 
wait much longer as it falls into the basin and the waiting 
times become exponentially distributed. 

From the comb and double-well examples we can de¬ 
duce some general principles for the waiting memory that 
results from coarse-graining the state space. Assume that 
the microscopic state space for a system is d-dimensional 
Euclidean space, which we coarse-grain into N macro ef¬ 
fective macroscopic states, each consisting of V m j cro mi¬ 
croscopic states. The interfaces between coarse-grained 
states have dimension d— 1. When the system first enters 
one of these coarse-grained states, it begins just inside an 
interface. Therefore the waiting time distribution ij)(t ) 
in the coarse-grained state is the first-passage time to re¬ 
turn to that (d — l)-dimensional interface. This return 
process is effectively a ID random walk, since only the di¬ 
rection normal to the interface matters (at least within a 
neighborhood of the initial state, assuming the interface 
is locally flat). Therefore the distribution of first-passage 
times to return to the interface will be the same as for 
the ID tooth in the comb (Eq. 65): 


ip(t) 


f-3/2 

e -t/(rAr^ icro ) 


for t < TiV^ licro , 
for t > tN^. 1cio , 


(69) 


where the crossover time between these regimes is the 
characteristic time scale TN^ iclo to explore the coarse¬ 
grained state (Eq. 60; v = 2 for d = 1, v = 1 for d > 2), 
and r is a microscopic time scale which is 0(1) in V m i cro . 
The waiting time moments are approximately 


0 (n) 



t -3/2 f n 


pri/(n— 1 / 2 ) 

v micro 


In particular, the CV is 


(70) 


q( cv ) 



N, 


1/2 


N 1/4 


for d = 1, 
for d > 2. 


(71) 


As with the ID comb, the CV scales as a power of the 
coarse-grained state size, but rather slowly. 

We can now determine whether such waiting time dis¬ 
tributions will lead to different statistics of path lengths 
and times in the coarse-grained model. Since the mean 
path length in the coarse-grained model is l ~ V^ acro 
(assuming the microscopic and coarse-grained spaces 
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FIG. 4. Effect of memory in coarse-grained metastable states, (a) Double-well potential (Eq. 68) exactly coarse-grained 
into states A and B with boundary along the green dashed line. We also show example paths (solid blue and red lines) that 
both exit B (circles) but start from different initial conditions within B (squares), (b) Length distributions p(£) of paths (with 
P = 10) that exit B but start at different initial conditions (at the central barrier, corresponding to the blue square in (a), or in 
the low-energy basin, corresponding to the red square in (a)), along with the power law £~ 3 ' 2 for comparison. For paths that 
exit the coarse-grained state B, (c) the mean time and CVs P cv \ of length and time, and (d) skewness , t^ d and 
kurtosis litl, tlfl of length and time, all as functions of P. All calculations use a discretized square lattice with Aa; = 0.05 over 
the space (x,y) G [—2,2] x [—2,2]. 


have the same dimensionality), the condition (#( cv )) 2 -C £ 
(Eq. 29) for equivalent path length and time statistics be¬ 
comes 


-Wiicro ^ -^macrcr (7%) 


This is consistent with the condition found for the ID 
comb where A ;, ' rn i f:ro — Ltooth and lV macr0 = ^backbone- 
For the double-well model, iV m i cro ft! 3200 (number of 
microscopic lattice points in A or B) and N maclo = 2; 
Eq. 72 does not hold in this case, so we expect signifi¬ 
cant differences in the statistics of path lengths (jumps 
between A and B) and path times in the coarse-grained 
model. In general, Eq. 72 implies that the more coarse- 
graining there is (resulting in fewer but larger coarse¬ 
grained states), the more significant the memory effects 
are on the effective CTRW. 


D. Random barrier model and memory from spatial 
disorder 

To further demonstrate the effects of a complex energy 
landscape on path statistics, we consider the random bar¬ 
rier model 3 ’ 30 (RBM), a simple model of transport in dis¬ 
ordered systems. In this model, a particle diffuses across 
a regular lattice with quenched energy barriers of random 
height between neighboring points. Here we consider a 
2D lattice with energy barriers drawn from an exponen¬ 
tial distribution p(E) = E^ x e~ E l E °, where E 0 is the 
average energy 47 . We assume a Markov CTRW on this 
lattice with symmetric transition rates between neighbor¬ 
ing states that depend exponentially on the intervening 
energy barrier: 

(x',y'\W\x,y}=r 0 e-e E V’ v, -’ x ’ y \ (73) 

where To is the rate of traversing a barrier of zero height 
(maximum possible rate), and E(x' ,y'\x,y) is the en¬ 
ergy barrier between (x',y') and (x, y). We use reflect¬ 
ing boundary conditions and set Tq = Eq = 1 without 
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loss of generality, as these two quantities set the overall 
time and energy scales. From the rates in Eq. 73 we de¬ 
termine jump probabilities and exponential waiting time 
moments using Eqs. 2 and 5. 

Figure 5 shows a single (quenched) realization of the 
RBM on a 10 x 10 lattice for different values of /3. In 
each panel cells correspond to lattice points while the 
gray-scale bars between them indicate the height of the 
intervening energy barriers (same in all panels). Due 
to the exponential distribution of energies, most barriers 
are low (black), with only a few relatively high barri¬ 
ers (white). We consider the ensemble of first-passage 
paths on this landscape from (1,1) (bottom-left corner) 
to (10,10) (top-right corner). First we determine path 
statistics for /3 = 0 (Fig. 5(a)), where all transition rates 
are equal and the barriers have no effect (Eq. 73). The 
leftmost panel of Fig. 5(a) shows the mean waiting time 
9^ 1 \x,y) in each state. When all transition rates are 
equal, 9 < ' 1 \x,y ) depends only on the state’s connectiv¬ 
ity: the states in the bulk with more neighbors have 
shorter mean waiting times than do the edge and es¬ 
pecially the corner states, which have fewer neighbors. 
The middle panel of Fig. 5(a) shows the average num¬ 
ber of visits v(x , y) to each state during the first-passage 
process (Sec. Ill B). For /3 = 0, the number of visits 
depends on both the distance to the final state as well 
as the state’s connectivity: edge and corner states with 
fewer neighbors are visited less often than are bulk states 
the same distance from the final state. When we con¬ 
sider the mean fraction of time spent in each state (the 
product of 9 < - 1 \x,y) and v(x,y ), normalized by the total 
mean path time t; rightmost panel of Fig. 5(a)), the 
connectivity-dependence largely disappears, so that the 
fraction of time depends mostly just on the distance to 
the final state. 

For (3 > 0 the effects of the random energy barriers 
emerge. In Fig. 5(b) and 5(c) we show path statistics for 
f3 = 1 and /3 = 5. States with large barriers around them 
acquire significantly longer mean waiting times, leading 
to a very broad distribution of time scales; at /? = 5, 
the mean waiting times span three orders of magnitude 
(Fig. 5(c), leftmost panel). However, states with ex¬ 
tremely long mean waiting times also tend to have many 
fewer visits on average (Fig. 5(b) and 5(c), middle pan¬ 
els). This is because the high energy barriers that make 
these states difficult to exit also make them difficult to 
enter in the first place. In contrast with /3 = 0, where 
v(x,y) was determined by both the distance to the fi¬ 
nal state and the state’s local connectivity, for large /? 
the average number of visits becomes predominately de¬ 
termined by the state’s local properties, i.e., its mean 
waiting time, rather than its global position on the lat¬ 
tice. However, the heterogeneity of 9^\x, y) and v(x,y) 
across the lattice nearly vanishes when considering their 
product, the mean fraction of time (Fig. 5(b) and 5(c), 
rightmost panels): as with /3 = 0, the distance to the fi¬ 
nal state primarily determines the fraction of time spent 
at a lattice point. Instead of varying smoothly across 


states as for /3 = 0, though, at /3 = 5 the fraction of time 
appears to have four distinct plateaus on the 2D lattice. 
Within each plateau, the particle spends approximately 
the same fraction of total time at each lattice point. 

Figure 5 also shows the average of all first-passage 
paths (magenta lines). We calculate the average path 
by defining state functions for each spatial coordinate as 
B x (x,y) = x and B y (x,y) = y , and using Eq. 55 to de¬ 
termine the mean positions B x {€) and B y (£) as functions 
of the intermediate jump i along a path. We plot these 
mean positions for every 100th jump in Fig. 5 to repre¬ 
sent the average path of the particle. For /3 = 0, the 
average path is necessarily symmetric across the diago¬ 
nal and asymptotically converges toward the final state 
(Fig. 5(a)). For /3 > 0, the energy barriers slightly dis¬ 
tort the average path at the beginning, but as the path 
approaches the final state, these asymmetries largely av¬ 
erage out (Fig. 5(b) and 5(c)). 

We next consider the distributions of path length, 
time, and action for the RBM. For /3 = 0, all of these 
distributions are close to exponential in shape, as ex¬ 
pected from previous examples. For example, the length 
distribution has CV I( cv ) ~ 0.89, skewness ~ 1-99, 
and kurtosis ss 8.95. The moments for path time and 
action are also very close to these values: indeed, Eqs. 26 
and 30 imply that these distributions should all be very 
similar since the network is nearly homogeneous. 

What happens to these distributions in the presence of 
a complex energy landscape (/3 > 0)? Figure 6 shows dis¬ 
tributions of the first four moments over many quenched 
realizations of the RBM at different f3. For f3 = 1 both 
the mean path length and time are mostly close to their 
values at /3 = 0 (Fig. 6(a)), and their CVs and stan¬ 
dardized moments indicate that the distributions are still 
close to exponential (Fig. 6(b)-(d)). Larger /?, however, 
leads to a very wide range of possible length and time mo¬ 
ments, which can span several orders of magnitude across 
realizations. The correlations between path lengths and 
times are also significant. Mean lengths and times are 
mostly clustered along the diagonal, indicating their pro¬ 
portionality for most realizations, but there are some re¬ 
alizations with mean time much larger than mean length 
(Fig. 6(a)). For CVs and standardized moments, many 
realizations that deviate from exponential distributions 
do so equally in both length and time, resulting in points 
along diagonal. However, there are also many realizations 
with highly non-exponential distributions of path times, 
even though the length distribution is close to exponen¬ 
tial (Fig. 6(b)-(d)). In this case the approximate equiv¬ 
alence between path length and time in Eq. 26 breaks 
down not because (0( cv )) 2 t — this is never true in 
our RBM model since 9( cv ' ) (x,y) = 1 for all ( x,y) and 
i is always large — but because of the spatial disorder. 
Equation 26 is derived for a network with identical wait¬ 
ing time distributions at all states, but the RBM has a 
very broad range of mean waiting times for large /3 , as 
the example in Fig. 5(c) shows. Thus, a rugged energy 
landscape even with Markovian waiting times can lead 
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FIG. 5. Spatial properties of first-passage paths in the random barrier model. For a 10 x 10 lattice, we show statistics 
of first-passage paths from (1,1) to (10,10) for a single quenched realization of the energy barriers. Each colored cell corresponds 
to a lattice point (x,y), with gray-scale bars indicating energy barriers between lattice points (higher energies are white, lower 
energies are black). Energy barriers are randomly sampled from an exponential distribution with mean Eo = 1, and the 
transition rate across a zero energy barrier is Fo = 1. The leftmost column shows the mean waiting time 9 ( ' V> (x,y), the middle 
column is the average number of visits v(x,y), and the rightmost column is the average fraction of time 9^ 1 \x,y)v(x,y)/t ( ' 1 ^ 
spent at each lattice point. Rows correspond to different inverse temperatures: (a) (3 = 0, (b) (3 = 1, and (c) (3 = 5. Magenta 
points show the average particle position for every 100th jump (connected by straight lines to guide the eye). 


to non-exponential path statistics, and hence the appear¬ 
ance of memory; such non-exponential kinetics have long 
been discussed in the context of glasses 21,22 . 

Figure 6(e)-(h) shows similar distributions of mo¬ 
ments for path action (plotted against path length mo¬ 
ments for reference). Equation 30 shows that the mo¬ 
ments of length and action are proportional for networks 
with homogeneous connectivity. While the 2D lattice in 
the RBM is not exactly homogeneous due to boundary 


conditions, Fig. 6(e) shows mean length and action to 
be very nearly proportional for almost all realizations. 
Since mean action is equivalent to path entropy, its wide 
range of possible values indicates that first-passage in 
some realizations is dominated by a few relatively high- 
probability paths, while in other realizations it is domi¬ 
nated by a large number of much lower-probability paths. 
The higher moments of action in Fig. 6(f)-(h) indicate 
that usually action is nearly exponentially distributed 
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FIG. 6. Distributions of path statistics in the random barrier model. For 1000 quenched realizations of the energy 
barriers on a 10 x 10 lattice, we show: (a) mean path length versus mean path time (b) length CV fl cv ' > versus time 
CV A cv h (c) length skewness versus time skewness (d) length kurtosis versus time kurtosis (e) mean length 
£(a versus mean action s^ 1 ); (f) length CV versus action CV s < - cv ^; (g) length skewness versus action skewness s^i 
and (h) length kurtosis versus action kurtosis s^d- Blue points are /? = 1, red points are /3 = 3, and green points are /? = 5; 
the horizontal and vertical dashed magenta lines correspond to the values of the moments for /3 = 0. We also show a diagonal 
gray line with slope 1 to guide the eye. 


even for larger fj. In particular, many of the realiza¬ 
tions with non-exponential length distributions still have 
exponentially-distributed actions. This suggests that the 
distribution of path actions is much more weakly affected 
by the energy landscape compared to path lengths and 
times. 


V. DISCUSSION 

We have studied CTRWs on networks using statisti¬ 
cal mechanics of the path ensemble. A particular conve¬ 
nience of the path formalism lies in exploring the relation¬ 
ship between the distributions of path lengths and path 
times, which can be viewed as the relationship between 
the full continuous-time process and its discrete-time pro¬ 
jection. Discrete-time models have generally dominated 
the theory of random walks not only due to their simplic¬ 
ity, but also because we expect a continuous-time process 
on the same network to be nearly equivalent under cer¬ 
tain conditions 3,13 . A well-known exception to this ex¬ 
pectation is for waiting time distributions ip(t) without 
a characteristic time scale (divergent mean), which can 
produce anomalous diffusion even on regular lattices 3 . 
Using our approach, we have identified two more im¬ 


portant exceptions. If all states have identical waiting 
time distributions Eq. 26 shows that continuous- 
and discrete-time dynamics will have different statistics 
if (0( cv )) 2 > (£”} / (£" _1 ) ~ l, where 6^ is the CV 
of the waiting time distribution ip(t) and l is the mean 
path length. We should therefore expect significant dif¬ 
ferences between continuous- and discrete-time dynamics 
to occur when i^(t) is much more broadly dispersed than 
an exponential distribution (large 0( cv )) and for small 
state spaces, which produce small l (Eq. 60). Further¬ 
more, the 2D double-well example suggests this condition 
is still valid (Fig. 4(d)) even if the waiting time distribu¬ 
tions and jump probabilities vary across states, as long as 
they do not vary too much. If they do, however, we find 
another exception to the equivalence of continuous- and 
discrete-time dynamics: even with exponential waiting 
times, spatial disorder can produce very different distri¬ 
butions of path lengths and path times, as illustrated in 
the random barrier model (Fig. 6). 

Although we have focused primarily on moments of 
path statistics in this work, ideally we would like to know 
the entire distributions of these quantities. In princi¬ 
ple one can fit a parameterized distribution to the mo¬ 
ments. In most statistical applications this “method of 
moments” typically produces a good approximation for 























20 


well-behaved distributions, especially using a very gen¬ 
eral parameterization such as the Pearson distribution 43 . 
For path distributions, a linear combination of exponen¬ 
tial functions is likely the most appropriate choice. Since 
path length, time, and action distributions are frequently 
very similar, and since our method explicitly calculates 
the entire path length distribution already, fitting distri¬ 
butions from moments would be most valuable in cases 
where the continuous- and discrete-time processes are 
very different. 

In any case, the moments of path statistics themselves 
are valuable for quantifying deviations from a simple ex¬ 
ponential distribution. These deviations are important 
because they represent a form of memory: the amount of 
time for a process to occur depends on how much time 
has already passed. We have emphasized how coarse- 
graining many “microscopic” states of a system into a 
smaller number of effective “macroscopic” states gener¬ 
ally leads to non-exponential ip(t) in the coarse-grained 
states; we explicitly demonstrated this by coarse-graining 
teeth in a ID comb (Fig. 3) and low-energy basins in a 
double-well potential (Fig. 4). Furthermore, we have ar¬ 
gued that in coarse-grained states will frequently 
obey Eq. 69, with a power-law regime for short times 
and an exponential regime for long times. Physically, 
this distribution arises because the system always starts 
just inside the boundary of a coarse-grained state; there¬ 
fore, it can either quickly recross the boundary, leading 
to the power-law regime, or it can explore the rest of the 
coarse-grained state, leading to the exponential regime. 
Compared to a simple exponential distribution, this hy¬ 
brid distribution is enriched by the power law at short 
times, meaning that very short waiting times are much 
more likely than would be expected if the system started 
in the middle of the coarse-grained state rather than near 
the boundary. However, if the system does not quickly 
exit, it is likely to wait much longer as it explores the rest 
of the coarse-grained state. This effective ip(t) is typically 
much broader than an exponential distribution, indicated 
by its larger CV (Eq. 71); linking this with our condition 
on path length and time statistics ($( cv )) 2 -C I, we obtain 
a condition that shows how much coarse-graining is nec¬ 
essary to see significant memory effects in the statistics 
of path times (Eq. 72). 

Representing complex state spaces by simpler, coarse¬ 
grained representations has long been an implicit element 
of stochastic models. In recent years it has been ex¬ 
plored in Markov models of molecular systems such as 
proteins 1,2,16 . Non-exponential effects may be important 
in these systems, especially if the coarse-grained networks 
are not very large. Indeed, non-exponential distributions 
of transition times have previously been found for both 
protein folding 1 ' and enzyme kinetics 19 ; Reuveni et al. 20 
showed that these memory effects could lead to quali¬ 
tatively different properties of enzyme unbinding within 
the Michaelis-Menten framework. Our observations un¬ 
derscore the importance of going beyond characterizing 
such processes by single rates, which implicitly assumes 


an exponential distribution of times. 

Besides waiting memory in the form of non-exponential 
time distributions, an additional form of memory induced 
by coarse-graining is in the jump process. For example, 
consider a triple-well potential coarse-grained into states 
A, B, and C. When the system crosses the barrier from A 
into B, it is much more likely to jump back into A rather 
than jump to C, since it begins much closer to A in the 
microscopic space. We can account for this in our frame¬ 
work by extending the state space to include not only the 
current state of the system (e.g., A, B, or C), but also the 
previous state; the jump process in this extended state 
space is once again Markovian (although the waiting time 
distributions remain non-exponential). We also note that 
this coarse-graining may require non-separable waiting 
time distributions tjj(t\a —> er'), since the distribution of 
times to return to A from B may be quite different from 
the distribution to reach C. Our framework can readily 
address this generalization (Appendix E). We look for¬ 
ward to studying the combined roles of jump and waiting 
memory in coarse-grained molecular models. 
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Appendix A: Asymptotic form of the path length 
distribution 

The path length distribution p{l) is formally given by 
Eq. 12, which involves the sums of path probabilities V[<p\ 
for all paths ip of length More explicitly, we can write 
p(£) using matrix elements of powers of the jump matrix 
Q and summing over all final states: 

p(*)= E hqVo>, (at) 

^’GSfinal 

where |7To) = 7r o(o')|( J ) is the vector of initial state 

probabilities. We can decompose Q into its Jordan form 


Q = P(D + N)P \ (A.2) 

where D is a diagonal matrix with the eigenvalues of Q, 
N is a nilpotent matrix, and P is an invertible matrix 48 . 
Powers of Q are therefore 


Q £ = P(D + N) f P _1 

= E (^Wn^p- 1 . 


(A.3) 
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If Q is exactly diagonalizable, then N = 0, and so for 
large t, the leading order term in Eq. A.l is proportional 
to q e = e^ log<z , where q < 1 is the largest eigenvalue of 
Q. If Q is not diagonalizable, the leading term will still 
be proportional to e^ logI? , but may also include a polyno¬ 
mial factor in £ due to the binomial coefficient in Eq. A.3. 
However, the polynomial factor only contributes logarith¬ 
mically to the exponent, i.e., £ k e n ° sq = e n ° gq+kXogl , and 
thus we can neglect it for large £. Therefore in general 
we have p(£) ~ e flog9 for large £, and since this sug¬ 
gests mean path length must be £ ~ — 1/ logg, we obtain 
Eq. 13. 

Appendix B: Exact relations between path length and time 
moments using generating functions 


Therefore the moment-generating function for path time 
is 

p OO 

/(s) = / dt f(t) e st 
Jo 

oo 

= ^2p(0(.'<p( s )Y 

(B.7) 

OO 

= ^p(£)e nog ^ s) 

1=0 

= p (i> c (s)), 

while the cumulant-generating function for path time is 


Here we derive exact relations between path length and 
time moments when all states have identical waiting time 
distributions: ip(t\cr) = We define the moment¬ 

generating function for the path length distribution 


/ c (s) = log/(s) 

= logp(V> c (s)) (B.8) 

= Pc (^c(s)) • 


P(s) = ^2 p( 0 e s£ , 

t =o 

so that the moments are 

<£")=p<">( 0), 

where the superscript denotes the derivative: 


(B.l) 


We can obtain moments and cumulants of path time 
by taking derivatives of its generating functions: 


(B.2) 


( r ( ">) = /(”)(()), 



(B.9) 


To express these in terms of the length and waiting time 
moments, we use Faa di Bruno’s formula for derivatives 
of composite functions 3 ': 


P^)=-p( S ) 


(B.3) 


s=0 


The cumulant-generating function is therefore p c (s) = 
logp(s) with 


— <?(h(s)) = 9 {k \Ks))Bn,k{h (1 \s), h (2) (s), 

k= 1 

h (n - fc+1) (s)), (B.10) 


(n c =p^\ o). 


(B.4) 


We similarly define the moment- and cumulant- 
generating functions for the waiting times: 


pOO 

'ijj(s) = dt t l>(t) e st , = V>( n )(0), 

Jo 


(B.5) 


A 0) = log 4>(s), 


0to=$">(O). 


When the waiting time distributions are ip(J) f° r every 
state, the path time distribution is (Eqs. 7 and 14) 


°° p OO p OO 

f( t )='Yp(0 dt 0 ijj(to) / dti ■ ■ ■ 

£=0 *^° 

f- oo / £—1 \ 

x J die-! 1 ) 6 ( t- Y'jiJ • (B.6) 


where B n ,fc are the partial Bell polynomials and super¬ 
scripts again denote derivatives. Thus the path time mo¬ 
ments are 


r (n) ) = f (n \ 0) 


= 


S—0 


= t'PHMQ)) 

k= 1 

xB 4 (i 1 >(0),f(0),..,f-‘ +1 ) 

n 

= Y B n,k (6PA 2 \. . • , 0 (- fc+1 >) . 


k =1 


(B.ll) 


This proves Eq. 24. We can similarly obtain the path 
time cumulants: 
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= /c (n) (0) 



n 

=E^ > (&(°)) 

k =1 


x B n ^ 


(o), $ 2) (o),..., ^ n - fc+1) (o)) 


n 

=j2(n c Bn, k (^,^ 2) ,...,^"- fc+1) )- 

fc=i 

(B.12) 


Appendix C: Approximate relations between path length 
and time moments based on the central limit theorem 


Here we use the central limit theorem to obtain approx¬ 
imate relations between path length and time moments 
when all states have identical waiting time distributions: 
ip(t\cr) = if{t). Equation B.6 gives the general relation 
between the length and time distributions in this case, 
where the nested integrals represent the probability dis¬ 
tribution of the sum of the waiting times. For sufficiently 
long paths, this distribution will be approximately Gaus¬ 
sian 


2n£9, 


( 2 ) 


: exp 


2Wi 2) J ’ 


(C.l) 


and hence 


f{t) « e P( £ ) 


e=o 


2nM 


( 2 ) 


: exp 


( 

't - £9p^ 

V 

2£0f ) 


(C.2) 


From this we can obtain approximate relations between 
the moments. For example, the first two path time mo¬ 
ments are 


(t«) »«<’>(£), 
(t< 2 ))»(9(‘>) 2 <£ 2 > +«?>(£), 


which are in fact identical to the exact result (Eqs. 24 
and 25) as expected. 


Appendix D: Path lengths and times with homogeneous 
exponential waiting time distributions 

When every state has the same exponential waiting 
time distribution ip(t) = 9~ 1 e ~ t ^ e , the sum of the £ wait¬ 
ing times has an Erlang distribution: 


dtn \e~ t0 / e I dt i 4e" tl/e • • • 


1 


9 


9 


t-l 


dti -1 S (t — E t 

m 


i—0 

-1 


-t/e 


{£-1 )!<T ‘ ^ D ' 1 ^ 

Thus the total path time distribution is (using Eq. B.6) 


= <pq 

In this case we can determine the complete distribution 
of times f(t) given the complete distribution of lengths 
pit). We can directly calculate the path time moments 
to be 



J dt f{t) t n 

9 n <£(£ + 1)-■•(£ + «- 1)) 

n 

9 n J2(£ k )K,k |, 


k=l 


(D.3) 


where |s n ,fc| are the unsigned Stirling numbers of the first 
kind 3 7 . The first few moments are 


(r (1) } = 0<£), 

(r (2) } = 0 2 «£ 2 )+ (£)), (D.4) 

(t (3) ) = 9 3 ((£ 3 ) + 3 (£ 2 ) + 2 <£)) . 


This is consistent with the general result in Eq. 24 since 
9cP = (j — 1)! 9i for an exponential distribution and 
B n ,k( 0!, 1!,..., (n — fc)!) = |s„ ife | 37 . 


Appendix E: Proof of recursion relations for moment 
matrices 

We now show that the T)"' matrices generated by the 
recursion relation of Eq. 34 indeed calculate the path time 
moments according to Eq. 33. We first successively apply 
the recursion relation to expand the £th-order matrix in 
terms of lower-order matrices: 
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-<(«) 


Q E 


u- 1=0 


j£-i 




= Q E 

H- 1=0 


n 

J^-i 


© fe_l) Q E ( . ^ -1 j@0«- 2 )T 

i«_ 2 =o ' ■ w_2 ' 


(n-U-x-U-i) 

-2 


n Jt-i n—j£-i—Ji-2 - Ji / \ / \ / 

ee- e C”, ";E-”" J “T 

je-i=0j e -2=0 j o =0 v 2 ' v 

x Q0 fe -’)Q0^- 2) • • • Q eUo) T (n-h-i-h -2 jo) 

V ( ” I Q0C7/- 1 )Q0O‘/-a) . . . Q0(jo) 

,c,, ,, : . 3t 


■Ji 


(E.l) 


where we have invoked the initial condition 

rr\(n—je-l—je-2 jo) _ r 1 fr-n™ 

1 0 - Oo,n- 3 i_ 1 -^_ 2 - jo i trom 

Eq. 31 to obtain the multinomial sum (recall that each 

summation in the multinomial sum is from 0 to n subject 


to the constraint jo + ji + ■ ■ • + jt~ 1 = n). Now we take 
the matrix element of T) ' for the initial distribution 
1 7To) = X^cr 7r o( cr )| (T ) and Of € S’finai, and insert identities 
of the form \ a )( a \ to obtain 


(^iTnVo) = E (■ • U ) E (crf|Q 0 ( ^- l) |crf-l)(°'f-l|Q® (jf ” 2) | cr ^- 2 ) • • • (cri|Q© ( - 7o) |cro) 7r o(c r o) 

= E <cr £ |Q|cr^_i)<cr^_i |Q|ct £ _ 2 > - - - <CTi |Q|cr 0 )7To(cro) 




E 




.3o,3u • • •»J^-i 


6»®°)(cr 0 )6>® 1 )(cr 1 ) • • • 


(E.2) 


Next, we sum over final states on and path lengths i to 
obtain 


E E (c r l T r ?l) ko) = E E E (o-flQK— i)(CTf— i|Q|ct£_ 2 )--- (oi|Q|<to)7To(0-o) 

f —0 a’<6Sfi„a,l t=0 CT^eSfinal <?0 ,&1 , • ■ ■ l -1 


E 


Jo 


Jo, Jl, • • ■, J^-i 


• • • e^-^ioe- 1). (E.3) 


Substituting , P[y>] (Eq. 8), the time moment functional into Eq. E.3, we finally obtain Eq. 33: 
T^[<p\ (Eq. 16), and the sum over paths 


E E H T fVo> = E p k] r(n) k] 

^ fcOcr^eSfinal v> (E.5) 

E = E E E < E - 4 > 

If £=Q CT£G5final 


= (r (n) ). 
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We now show that the recursion relation of Eq. 40 are After inserting identities and using the definition 
correct for any functional of the form in Eq. 22; this will V) = (cr'IQ \o)(JJ(a\ er))- 1 2 3 4 5 6 7 8 9 ', we obtain 

prove the action recursion relation (Eq. 37) as a special 
case. As in Eq. E.l, successive applications of the 
recursion relation yield 


ur= e 


30t3if‘i3l—i 


.3 0, Jl, - ■ -,Jt-ly 

x ■ ■ • n (io) . 


(E.6) 


J 


(^|U^ ) |tto)= E (^cIQI^-iX^-iIQI^- a) - - • <0-1 |Q|c r 0 >7T 0 (cr 0 ) 

<JQ,<7 1 


E E. 

jodlf'di — l 


Jo, ji, ■■-,3i -1 


1 (f 7 (cr£_i,<T£_ 2 ))' 7£ 2 • • • (U(ai,a 0 )) 30 ■ 


(E.7) 


Just as in Eqs. E.3 and E.5 for time moments, we can 
then sum over final states and path lengths to show 
that the matrices are related to the path ensem¬ 

ble averages via Eq. 39. Finally, we explain how to 
use the generalized recursion relation in Eq. 40 to cal¬ 
culate path time moments when the waiting time dis¬ 
tributions are non-separable. In this case we define 
(<r'|fi^cr) = (a'\Q\a)9^ (a —> <r'), where 9^(a —>• a') 
is the j th moment of the non-separable waiting time 
distribution ^(t\(7 -A a'). We can then use the recur¬ 
sion relation of Eq. 40 to calculate the time moments 
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